Goto

Collaborating Authors

 adversary mab



Supplementary materials for Paper " Bandit Samplers for Training Graph Neural Networks "

Neural Information Processing Systems

We show the convergences on validation in terms of timing (seconds) in Figure 1 and Figure 2. Basically, our algorithms converge to much better results in nearly same duration compared with Note that we cannot complete the training of AS-GA T on Reddit because of memory issues. Note that the comparisons of timing between "graph sampling" and "layer sampling" paradigms have As a result, we do not compare the timing with "graph sampling" approaches. That is, graph sampling approaches are designed for graph data that all vertices have labels. To summarize, the "layer sampling" approaches are more flexible and general compared with "graph sampling" Before we give the proof of Theorem 1, we first prove the following Lemma 1 that will be used later.


Bandit Samplers for Training Graph Neural Networks

Liu, Ziqi, Wu, Zhengwei, Zhang, Zhiqiang, Zhou, Jun, Yang, Shuang, Song, Le, Qi, Yuan

arXiv.org Machine Learning

Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs). However, due to the intractable computation of optimal sampling distribution, these sampling algorithms are suboptimal for GCNs and are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than fixed weights, such as Graph Attention Networks (GAT). The fundamental reason is that the embeddings of the neighbors or learned weights involved in the optimal sampling distribution are changing during the training and not known a priori, but only partially observed when sampled, thus making the derivation of an optimal variance reduced samplers non-trivial. In this paper, we formulate the optimization of the sampling variance as an adversary bandit problem, where the rewards are related to the node embeddings and learned weights, and can vary constantly. Thus a good sampler needs to acquire variance information about more neighbors (exploration) while at the same time optimizing the immediate sampling variance (exploit). We theoretically show that our algorithm asymptotically approaches the optimal variance within a factor of 3. We show the efficiency and effectiveness of our approach on multiple datasets.